Audio decoder with an adaptive frequency domain downmixer

ABSTRACT

A method and apparatus for decoding a multi-channel audio bitstream in which adaptive frequency domain downmixer (3) is used to downmix, according to long and shorter transform block length information (17), the decoded frequency coefficients of the multi-channel audio (12,13,14,15) such that the long and shorter transform block information is maintained separately within the mixed down left and right channels. In this way, the long and shorter transform block coefficients of the mixed down let and right channels can be inverse transformed adaptively (4,5,6,7) according to the long and shorter transform block information, and the results of the inverse transform of the long and short block of each the left and right channel added together (8,9) to form the total mixed down output of the left and right channel.

FIELD OF THE INVENTION

This invention relates to multi-channel digital audio decoders fordigital storage media and transmission media.

BACKGROUND ART

An efficient multi-channel digital audio signal coding method has beendeveloped for storage or transmission applications such as the digitalvideo disc (DVD) player and the high definition digital TV receiver(set-top-box). A description of the standard can be found in the ATSCStandard, “Digital Audio Compression (AC-3) Standard”, Document A/52,Dec. 20, 1995. The standard defined a coding method for up to sixchannel of multi-channel audio, that is, the left, right, centre,surround left, surround right, and the low frequency effects (LFE)channel.

In this coding method, the multi-channel digital audio source iscompressed block by block at the encoder by first transforming eachinput block audio PCM samples into frequency coefficients using ananalysis filter bank, then quantizing the resulting frequencycoefficients into quantized coefficients with a determined bitallocation strategy, and finally formatting and packing the quantizedcoefficients and bit allocation information into bit-stream for storageor transmission.

Depending upon the spectral and temporal characteristics of the audiosource, adaptive transformation of the audio source is done at theencoder to optimize the frequency/time resolution. This is achieved byadaptive switching between two transformations with long transform blocklength or shorter transforms block length. The long transform blocklength which has good frequency resolution is used for improved codingperformance; on the other hand, the shorter transform block length whichhas a greater time resolution is used for audio input signals whichchange rapidly in time.

At the decoder side, each audio block is decompressed from the bitstreamby first determining the bit allocation information, then unpacking andde-quantizing the quantized co-efficients, and inverse transforming theresulting coefficients based on determined long or shorter transformlength to output audio PCM data. The decoding processes are performedfor each channel in the multi-channel audio data.

For reasons such as overall systems cost constrain or physicallimitation in terms of number of output loudspeakers that can be used,downmixing of the decoded multi-channel audio is performed so that thenumber of output channels at the decoder is reduced to two channels,hence the left and right (L_(m) and R_(m) ) channels suitable forconventional stereo audio amplifier and loudspeakers systems.

Basically, downmixing is performed such that the multi-channel audioinformation is preserved while the number of output channels is reducedto only two channels. The method of downmixing may be described as:

L _(m) =a ₀ L+a ₁ R+a ₂ C+a ₃ L ₅ +a ₄ R ₅ +a ₅ LFE

R _(m) =b ₀ L+b ₁ R+b ₂ C+b ₃ L ₅ +b ₄ R ₅ +b ₅ LFE

where

L_(m): Mixed down Left channel output

R_(m): Mixed down Right channel output

L: Left channel input

R: Right channel input

C: Centre channel input

L₃:Surround left channel input

R₃:Surround right channel input

a₀₋₅: downmixing coefficients for left channel output

b₀₋₅:downmixing coefficients for right channel output.

Downmixing method or coefficients may be designed such that the originalor the approximate of the original decoded multichannel signals may bederived from the mixed down Left and Right channels.

For decoders in systems or applications where downmixing is required,the decoding processes which include the inverse transformation arerequired for all encoded channels before downmixing can be done togenerate the two output channels. The implementation complexity and thecomputation load is not reduced for such present art decoders eventhough only two output channels are generated instead of all channels inthe multi-channel bitstream.

To significantly reduce the implementation complexity and thecomputation load, the downmixing process should be performed at an earlystage within the decoding processes such that the number of channelsrequired to be decoded are reduced for the remaining decoding processes.In particular, since the inverse transform process is a complex andcomputationally intensive process, the downmixing should be performed onthe inverse quantized frequency coefficients before the inversetransform. One example of such solution is given in U.S. Pat. No.5,400,433 for which the inverse transform process was assumed to belinear. Another example is referred to in an article by Steve VERNON“Design and Implementation of AC-3 Coders”, IEEE Transactions onConsumer Electronics, vol. 41, no. 3, August 1995, NEW YORK US, pages754-759. Again, downmixing in the frequency domain is disclosed but onlyin the case where block switching is not used.

Due to the fact that inverse transform process of present art isadaptive in long or shorter transform block length depending upon thespectral and temporal characteristics of each coded audio channel, it isnot a linear process and therefore the known downmixing process cannotbe performed first. That is, combining the channels before the inversetransform process will not produce the same output that is produced bycombining the channels after the inverse transform process.

DISCLOSURE OF THE INVENTION

It is an object of this invention to provide a method and apparatus fordecoding a multi-channel audio bitstream which will overcome or at leastameliorate the foregoing disadvantages.

In the present invention, an adaptive frequency domain downmixer is usedto downmix, according to the long and shorter transform block lengthinformation, the decoded frequency coefficients of the multi-channelaudio such that the long and short transform block information ismaintained separately within the mixed down left and right channels. Inthis way, the long and shorter transform block coefficients of the mixeddown left and right channels can still be inverse transformed adaptivelyaccording to the long and shorter transform block information, and theresults of the inverse transform of the long and short block of each ofthe left and right channel are added together to form the total mixeddown output of the left and right channel.

Accordingly, in a first aspect, this invention provides a method ofdecoding a multi-channel audio bitstream comprising the steps ofsubjecting said multi-channel audio bitstream to a block decodingprocess to obtain frequency coefficients for each audio channel withineach block in the said multi-channel audio bitstream, unpacking long andshorter transform bock information for each audio channel within saidblock from said multi-channel audio bitstream, and determiningdownmixing coefficients for each audio channel within said multi-channelaudio bitstream, the method including the steps of:

(a) downmixing and frequency coefficients of each audio channel withinsaid block which are identified as long transform block by said long andshorter transform block information to form a left mixed down for longtransform block and a right mixed down for long transform block;

(b) downmixing said frequency coefficients of each audio channels withinthe said block which are identified as shorter transform block by saidlong and shorter transform block information to form a left mixed downfor shorter transform block and a right mixed down for shorter transformblock;

(c) inverse transforming each of said left mixed down for long transformblock, said right mixed down for long transform block, said left mixeddown for shorter transform block, and said right mixed down for shortertransform block to produce a left mixed down long inverse transformedblock, a right mixed down long inverse transformed block, a left mixeddown shorter inverse transformed block, and a right mixed down shorterinverse transformed block respectively;

(d) adding said left mixed down long inverse transformed block and saidleft mixed down shorter inverse transformed block to form a left totalmixed down; and

(e) adding said right mixed down long inverse transformed block and saidright mixed down shorter inverse transformed block to form a right totalmixed down.

In a second aspect, this invention provides an apparatus for decoding amulti-channel audio bitstream comprising means for block decoding saidmulti-channel audio bitstream to obtain frequency coefficients of eachaudio channel with each block, means for unpacking long and shortertransform block information for each audio channel within said block,and means for determining downmixing coefficients for each audio channelwithin said multi-channel audio bitstream, the apparatus including:

(a) means for downmixing said frequency coefficients of each audiochannel identified as long transform block by said long and shortertransform block information to form a left mixed down for long transformblock and a right mixed down for long transform block;

(b) means for downmixing said frequency coefficients of each audiochannel identified as shorter transform block by said long and shortertransform block information to form a left mixed down for shortertransform block and a right mixed down for shorter transform block;

(c) means for inverse transforming each of said left mixed down for longtransform block, said right mixed down for long transform block, saidleft mixed down for shorter transform block, and said right mixed downfor shorter transform block to produce a left mixed down long inversetransformed block, a right mixed down long inverse transformed block, aleft mixed down shorter inverse transformed block, and a right mixeddown shorter inverse transformed block respectively;

(d) means for adding said left mixed down long inverse transformed blockand said left mixed down shorter inverse transformed block to form aleft total mixed down;

(e) means for adding of said right mixed down long inverse transformedblock and said right mixed down shorter inverse transformed block toform a right total mixed down.

Preferably, the block decoding process includes:

(a) parsing the said multi-channel audio bitstream to obtain bitallocation information on each audio channel within said block;

(b) unpacking quantized frequency coefficients from said block usingsaid bit allocation information; and

(c) de-quantizing said quantized frequency coefficients to obtain saidfrequency coefficients using said bit allocation information.

A post-processing step is also preferably performed in which:

(a) the left total mixed down is subjected to a window overlap/addprocess wherein the samples within the left total mixed down areweighted, de-interleaved, overlapped and added to samples of a previousblock;

(b) the right total mixed down is subjected to a window overlap/addprocess wherein the samples within right total mixed down are weighted,de-interleaved, overlapped and added to samples of a previous block; and

(c) the results of the window overlap/add are subjected to an outputprocess wherein the results of the window overlap/add process areformatted and outputted.

According to a preferred embodiment of the present invention, an inputcoded bitstream of multi-channel audio is first parsed and the bitallocation information for each audio channel block is decoded. With thebit allocation information, the quantized frequency coefficients of eachaudio channel block are unpacked from the bitstream and de-quantized.The de-quantized frequency coefficients of all audio channels of a blockare then mixed down. This downmixing

(c) the results of the window overlap/add are subjected to an outputprocess wherein the results of the window overlap/add process areformatted and outputted.

According to a preferred embodiment of the present invention, an inputcoded bitstream of multichannel audio is first parsed and the bitallocation information for each audio channel block is decoded. With thebit allocation information, the quantized frequency coefficients of eachaudio channel block are unpacked from the bitstream and de-quantized.The de-quantized frequency coefficients of all audio channels of a blockare then mixed down. This downmixing is done separately for audiochannel blocks that are of long transform block length and of shortertransform block length; hence, four blocks of mixed down transformcoefficients are formed: the left mixed down for long transform block,the left mixed down for shorter transform block, the right mixed downfor long transform block, and the right mixed down for shorter transformblock.

The four blocks of mixed down transform coefficients are subjected tothe respective inverse transform for long transform block and shortertransform block. At the end of the inverse transform, the non-linearitybetween the long and shorter transform blocks is removed. The results ofinverse transform of the left mixed down for longer transform block andleft mixed down for shorter transform block are added together to formthe total mixed down left channel signal. Similarly, the total mixeddown right channel signal is formed. Any further post-processingrequired can then be performed on only these two total mixed downchannels, and the final results are outputted as audio PCM samples forthe left and right channels.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described by way of example only, withreference to the accompany drawings in which:

FIG. 1 is a block diagram of the audio decoder according to oneembodiment of the present invention;

FIG. 2 is a block diagram of one embodiment of an adaptive frequencydomain downmixer forming part of the decoder shown in FIG. 1;

FIG. 3 is a block diagram another embodiment of the adaptive frequencydomain downmixer shown in FIG. 2; and

FIG. 4 is a block diagram of an alternate embodiment of the inversetransform and post-processing processes forming part of the presentinvention.

BEST MODES FOR CARRYING OUT THE INVENTION

An audio decoder with an adaptive frequency domain downmixer accordingto a preferred embodiment of the present invention is shown in FIG. 1.An input multi-channel audio bitstream is first decoded by a bitstreamunpack and bit allocation decoder 1. An example of the inputmulti-channel audio bitstream is the compressed bitstream according tothe ATSC Standard, “Digital Audio Compression (AC-3) Standard”, DocumentA/52, Dec. 20, 1995. This input AC-3 bitstream consists of codedinformation of up to six channels of audio signal including the leftchannel (L), the right channel (R), the center channel (C), the leftsurround channel (L₅), the right surround channel (R₅), and the lowfrequency effects channel (LFE). However, the maximum number of codedaudio channels for the input is not limited. The coded informationwithin the AC-3 bitstream is divided into frames of 6 audio blocks, andeach of the 6 audio block contains the information for all of the codedaudio channel block (ie. L,R,C,L₅, R₅ and LFE).

In the bitstream unpack and bit allocation decoder 1, the inputmulti-channel audio bitstream is parsed and decoded to obtain the bitallocation information for each coded audio channel block. With the bitallocation information, the quantized frequency coefficients of eachcoded audio channel block are decoded from the input multi-channel audiobitstream. An example embodiment of the bitstream unpack and bitallocation decoder 1 may be found in the ATSC (AC-3) standard. Thedecoded quantized frequency coefficients of each coded audio channelblock are inverse quantized by the de-quantizer 2 to produce thefrequency coefficients 16 of corresponding coded audio channel block.Details of the de-quantizer 2 for AC-3 bitstream is found in the ATSC(AC-3) standard specification.

After generating the frequency coefficients of each or all of the audiochannel block, the frequency coefficients are mixed down in the adaptivefrequency domain downmixer 3 based on the long/shorter transform blockinformation 17 extracted from the input bitstream to produce four blocksof mixed down frequency coefficients consisting the left mixed down forlong transform block 12 (L_(ML)), the left mixed down for shortertransform block 13 (L_(MS)), the right mixed down for long transformblock 14 (R_(ML)), and the right mixed down for shorter transform block15 (R_(MS)). The L_(ML) 12 and L_(MS) 13 are subjected to inversetransform for long transform block 4 and inverse transform for shortertransform block 5 respectively, and the results are added together bythe adder 8. Similarly, the R_(ML) 14 and R_(MS) 15 are subjected toinverse transform for long transform block 6 and inverse transform forshorter transform block 7 respectively, and the results are addedtogether by the adder 9. The results of adder 8 and adder 9 aresubjected to post-processing 10 and post-processing 11 respectively,subsequently and finally outputted as output mixed down left channel 18and output mixed down right channel 19.

An embodiment of the adaptive frequency domain downmixer 3 is shown inFIG. 2. In this embodiment, the frequency coefficients (number 16 inFIG. 1) of an audio block are supplied in demultiplexed from CH₀ to CH₅(numeral 100 to 105) with respect to six audio channel. The long andshorter transform block information (number 17 in FIG. 1) is alsosupplied in demultiplexed form LS₀ to LS₅ (numeral 106 to 111) withrespect to the six audio channel. The input frequency coefficients CH₀to CH₅ are first multiplied by the respective downmixing coefficients a₀to a₅ and b₀ to b₅ (numeral 20 to 31) with multipliers (numeral 32 to43). The downmixing coefficients are either determined by application orby information from the input bitstream. The switches (numeral 44 to 55)are used to switch according to the long and shorter transform blockinformation LS₀ LS₅ of each of the audio channel the results of themultiplier (number 32 to 43) to the corresponding summator for L_(ML)56, summator for L_(MS) 57, summator for R_(ML) 58, and summator R_(MS)59. The results of the summator for L_(ML) 56 summator for L_(MS) 57,summator for R_(ML) 58, and summator R_(MS) 59 are outputted as L_(ML)12, L_(MS) 13, R_(ML) 14, R_(MS) 15, respectively. The overalloperations of this embodiment can be described in the followingequations: $\begin{matrix}\begin{matrix}{L_{ML} = {\sum\limits_{i = 0}^{n}\left( {a_{i} \times {CH}_{i} \times {LS}_{i}} \right)}} \\{L_{MS} = {\sum\limits_{i = 0}^{n}\left( {a_{i} \times {CH}_{i}\overset{\_}{{LS}_{i}}} \right)}} \\{R_{ML} = {\sum\limits_{i = 0}^{n}\left( {b_{i} \times {CH}_{i} \times {LS}_{i}} \right)}}\end{matrix} \\{R_{MS} = {\sum\limits_{i = 0}^{n}\left( {b_{i} \times {CH}_{i} \times \overset{\_}{{LS}_{i}}} \right)}}\end{matrix}$

where LS_(i) is the “Boolean” (0=shorter, 1=long) representation of thelong and shorter transform for each of the channel i=0 to n.

It should be noted that the number of audio channels in the presentembodiment is not limited to six, and can be expanded by increasing thenumber of multipliers and switches for the additional channels.

Another embodiment of the adaptive frequency domain downmixer 3 is shownin FIG. 3. The input frequency coefficients 16 are provided in sequenceof the coded audio channel block as CH_(i) where i is the audio currentchannel number. The input CH_(i) is multiplied by the correspondingdownmixing coefficients a_(i) 76 and b_(i) 77 using multiplier 60 and 61respectively, and the results are switched according to the long andshorter transform block information LS_(i) 17 of the current audiochannel block. If the current audio channel block is a long transformblock, the results of the multiplier 60 and 61 are accumulated to bufferfor L_(ML) 68 and buffer for R_(ML) 70 respectively using the adder 64and 66. On the other hand, if the current audio channel block is ashorter transform block, the results of the multiplier 60 and 61 areaccumulated to buffer for L_(MS) 69 and buffer for R_(MS) 71respectively using the adder 65 and 67. After all the frequencycoefficients of an audio block are received and processed, the resultsin buffers for L_(ML), L_(MS), R_(ML), and R_(MS) are outputted withcontrol Output_(M) 79 as L_(ML) 12, L_(MS) 13, R_(ML) 14, and R_(MS) 15respectively using switches 72, 73, 74 and 75.

FIG. 4 shows an alternate embodiment of the inverse transform andpost-processing processes. With the L/R select signal 88, switches 80and 85, the input mixed down frequency coefficients L_(ML) 12 and L_(MS)13 of an audio block are first inverse transformed with the respectiveinverse transform for long transform block 81 and inverse transform forshorter transform block 82. The results of the two inverse transform areadded together by adder 83 and the subject to post-processing 84 beforeoutputting to the left channel output buffer 86. Subsequently, the L/Rselect signal 88 is changed, and the input mixed down frequencycoefficients R_(ML) 14 and R_(MS) 15 are inverse transformed with therespective inverse transform for long transform block 81 and inversetransform for shorter transform block 82. The results of the two inversetransform are added together by adder 83 and then subject topost-processing 84 before outputting to the right channel output buffer87. Finally, the decompressed audio signals, output mixed down leftchannel 18 and output mixed down right channel 19, are sent out from theleft channel output buffer 86 and right channel output buffer 87respectively.

Examples of the inverse transform for long transform block (numerals 4and 6 of FIG. 1 and numeral 81 of FIG. 4) and inverse transform forshorter transform block numeral 5 and 7 of FIG. 1 and numeral 82 of FIG.4) can be found in the ATSC (AC-3) standard specification. An exampleembodiment of the post-processing module (numeral 10 and 11 of FIG. 1and numeral 84 of FIG. 4) consist of window, overlap/add, scaling andquantization can also be found the ATSC (AC-3) standard specification.

It will be apparent that by maintaining the long and shorter transformblock coefficients separately, downmixing can be performed in thefrequency domain in a multi-channel audio decoder with adaptive long andshorter transform block coded input bitstream. As this adaptivedownmixing is performed before the inverse transform, the number ofinverse transform per audio block is reduced to four instead of thenumber of coded audio channels; hence, if the number of coded audiochannels in the input bitstream to the multi-channel audio decoder issix to eight channels, the reduction of the number of inverse transformrequired will be two to four. This represents a signification reductionin implementation complexity and computation load requirement.

The foregoing describes only some embodiment of the invention andmodifications can be made without departing from the scope of theinvention.

The claims defining the invention are as follows:
 1. A method ofdecoding a multi-channel audio bitstream comprising the steps ofsubjecting said multi-channel audio bitstream to a block decodingprocess to obtain frequency coefficients for each audio channel withineach block in the said multi-channel audio bitstream, unpacking long andshorter transform block information for each audio channel within saidblock from said multi-channel audio bitstream, and determiningdownmixing coefficients for each audio channel within said multi-channelaudio bitstream, the method including the steps of: (a) downmixing saidfrequency coefficients of each audio channel within said block which areidentified as long transform block by said long and shorter transformblock information to form a left mixed down for long transform block anda right mixed down for long transform block; (b) downmixing saidfrequency coefficients of each audio channels within the said blockwhich are identified as shorter transform block by said long and shortertransform block information to form a left mixed down for shortertransform block and a right mixed down for shorter transform block; (c)inverse transforming each of said left mixed down for long transformblock, said right mixed down for long transform block, said left mixeddown for shorter transform block, and said right mixed down for shortertransform block to produce a left mixed down long inverse transformedblock, a right mixed down long inverse transformed block, a left mixeddown shorter inverse transformed block, and a right mixed down shorterinverse transformed block respectively; (d) adding said left mixed downlong inverse transformed block and said left mixed down shorter inversetransformed block to form a left total mixed down; and (e) adding saidright mixed down long inverse transformed block and said right mixeddown shorter inverse transformed block to form a right total mixed down.2. A method according to claim 1, wherein said block decoding processcomprises the steps of: (a) parsing the said multi-channel audiobitstream to obtain bit allocation information on each audio channelwithin said block; (b) unpacking quantized frequency coefficients fromsaid block using said bit allocation information; and (c) de-quantizingsaid quantized frequency coefficients to obtain said frequencycoefficients using said bit allocation information.
 3. A methodaccording to claim 2, further including a post-processing stepcomprising: (a) subjecting said left total mixed down to a windowoverlap/add process wherein the samples within said left total mixeddown are weighted, de-interleaved, overlapped and added to samples of aprevious block; (b) subjecting said right total mixed down to a windowoverlap/add process wherein the samples within said right total mixeddown are weighted, de-interleaved, overlapped and added to samples of aprevious block; and (c) subjecting the results of the window overlap/addto an output process wherein said results of the window overlay/addprocess are formatted and outputted.
 4. An apparatus for decoding amulti-channel audio bitstream comprising means for block decoding saidmulti-channel audio bitstream to obtain frequency coefficients of eachaudio channel with each block, means for unpacking long and shortertransform block information for each audio channel within said block,and means for determining downmixing coefficients for each audio channelwithin said multi-channel audio bitstream, the apparatus including: (a)means for downmixing said frequency coefficients of each audio channelidentified as long transform block by said long and shorter transformblock information to form a left mixed down for long transform block anda right mixed down for long transform block; (b) means for downmixingsaid frequency coefficients of each audio channel identified as shortertransform block by said long and shorter transform block information toform a left mixed down for shorter transform block and a right mixeddown for shorter transform block; (c) means for inverse transformingeach of said left mixed down for long transform block, said right mixeddown for long transform block, said left mixed down for shortertransform block, and said right mixed down for shorter transform blockto produce a left mixed down long inverse transformed block, a rightmixed down long inverse transformed block, a left mixed down shorterinverse transformed block, and a right mixed down shorter inversetransformed block respectively; (d) means for adding said left mixeddown long inverse transformed block and said left mixed down shorterinverse transformed block to form a left total mixed down; (e) means foradding of said right mixed down long inverse transformed block and saidright mixed down shorter inverse transformed block to form a right totalmixed down.
 5. An apparatus according to claim 4, wherein said means forblock decoding comprises: (a) means for parsing said multi-channel audiobitstream to obtain bit allocating information on each audio channelwithin said block; (b) means for unpacking quantized frequencycoefficients from said block using said bit allocation information; and(c) means for de-quantizing said quantized frequency coefficients tosaid frequency coefficients using said cit allocation information.
 6. Anapparatus according to claim 5, further including means for performing apost-processing process comprising: (a) means for subjecting said lefttotal mixed down to a window overlap/add process wherein the sampleswithin said left total mixed down are weighted, de-interleaved,overlapped and added to samples of a previous block; (b) means forsubjecting said right total mixed down to a window overlap/add processwherein the samples within said right total mixed down are weighted,de-interleaved, overlapped and added to samples of a previous block; and(c) means for subjecting the results of said window overlap/add processto an output process where said results of the window overlap/addprocess are formatted and outputted.